Search Autocomplete is a widely-used feature in modern applications like Google, Amazon, and YouTube. It enhances user experience by providing real-time suggestions based on partial input, helping users complete queries faster and discover popular or relevant search terms.
As the user types a query character by character, the system should return the top N suggestions that match the current prefix. For example, typing "app" might yield results like "apple", "app store", or "application".
These suggestions are typically ranked by relevance, popularity, frequency, or recency.
In this chapter, we will explore the low-level design of a Search Autocomplete system in detail.
Let’s start by clarifying the requirements:
Before diving into the design, it’s important to clarify how the autocomplete system is expected to behave. Asking targeted questions helps refine assumptions, define the scope, and align on core expectations for the system.
Candidate: Should the autocomplete system be case-sensitive?
Interviewer: No, all inputs should be treated as lowercase. The system should be case-insensitive.
Candidate: Should the system only support English, or do we need to account for Unicode/multilingual input?
Interviewer: Let’s assume only English characters for now.
Candidate: How should suggestions be ranked—alphabetically, by frequency of use, or both?
Interviewer: Good question. The system should support both strategies. The user of the system should be able to configure the ranking strategy.
Candidate: How many suggestions should be returned per prefix?
Interviewer: That should be configurable, perhaps a default of 10, but the system should allow specifying a custom limit.
Candidate: How does the system learn word frequencies? Are we tracking every time a word is added or searched?
Interviewer: Let’s increment the frequency every time a word is inserted into the system..
Candidate: Can users input new words over time, or is the dictionary fixed at initialization?
Interviewer: Words can be added dynamically during runtime.
Candidate: Should we support deleting a word or updating its frequency?
Interviewer: No, we can skip delete and update functionality for now.
After gathering the details, we can summarize the key system requirements.
After the requirements are clear, the next step is to identify the core entities that we will form the foundation of our design.
Core entities are the fundamental building blocks of our system. We identify them by analyzing the functional requirements and highlighting the key nouns and responsibilities that naturally map to object-oriented abstractions such as classes, enums, or interfaces.
Let’s walk through the functional requirements and extract the relevant entities:
This requirement is central to the system's performance. A standard list or hash map would be inefficient for prefix-based searches. This points to a specialized data structure, the Trie.
Each node in the Trie represents a character, forming a tree-like structure of prefixes. Therefore, we need a TrieNode entity to be the basic building block and a Trie class to manage the overall structure, including word insertions and prefix searches. The TrieNode will also need to store the frequency of each completed word.
This involves two steps: finding all words that start with the given prefix and then packaging them for further processing. The Trie class is responsible for traversing from the prefix's end node to find all valid descendant words. To hold the results of this traversal, which includes both the word and its associated data (like frequency), we need a simple data-transfer object. This leads to the Suggestion entity, which encapsulates a word and its ranking weight.
These core entities define the key abstractions of the autocomplete system and will guide the structure of our low-level design and class diagrams.
These classes primarily act as data containers with minimal behavior.
Represents a single node in the Trie data structure. It's the fundamental unit for storing character-level information and word metadata.
Attributes:
A simple data transfer object (DTO) that encapsulates a word and its associated ranking weight, making it easy to pass around between the Trie collection logic and the ranking strategy.
Attributes:
Attributes:
Methods:
Attributes:
Methods:
The relationships define how our classes interact.
"has-a")AutocompleteSystem has a Trie and has a RankingStrategy. The AutocompleteSystem manages the lifecycle and coordinates the actions of these components.
Trie is composed of TrieNode objects. The Trie class is responsible for creating and linking TrieNodes to form the prefix tree structure.
"uses-a")AutocompleteSystem uses Suggestion objects as an intermediary data structure.
The RankingStrategy interface and its concrete implementations (FrequencyBasedRanking, AlphabeticalRanking) are a clear application of the Strategy Pattern. This decouples the core suggestion-finding logic within AutocompleteSystem from the ranking algorithm.
It allows us to change the ranking behavior at runtime or easily add new ranking methods (e.g., by recency, by location) without modifying the AutocompleteSystem class.
The AutocompleteSystemBuilder class implements the Builder Pattern. This pattern simplifies the creation of the AutocompleteSystem object, which has multiple configuration parameters.
It provides a fluent, readable API for constructing the object step-by-step and separates the complex construction logic from the final object representation.
The AutocompleteSystem class itself serves as a Facade. It provides a simple, high-level interface (AddWord, GetSuggestions) to a more complex underlying subsystem composed of the Trie, Suggestion objects, and the RankingStrategy. Clients interact with this simple facade, remaining unaware of the intricate logic of Trie traversal and suggestion ranking.
1class TrieNode:
2 def __init__(self):
3 self.children: Dict[str, TrieNode] = {}
4 self.is_end_of_word: bool = False
5 self.frequency: int = 0
6
7 def get_children(self) -> Dict[str, TrieNode]:
8 return self.children
9
10 def is_end_of_word_check(self) -> bool:
11 return self.is_end_of_word
12
13 def set_end_of_word(self, end_of_word: bool):
14 self.is_end_of_word = end_of_word
15
16 def get_frequency(self) -> int:
17 return self.frequency
18
19 def increment_frequency(self):
20 self.frequency += 11class Suggestion:
2 def __init__(self, word: str, weight: int):
3 self.word = word
4 self.weight = weight
5
6 def get_word(self) -> str:
7 return self.word
8
9 def get_weight(self) -> int:
10 return self.weight1class Trie:
2 def __init__(self):
3 self.root = TrieNode()
4
5 def insert(self, word: str):
6 current = self.root
7 for ch in word:
8 if ch not in current.get_children():
9 current.get_children()[ch] = TrieNode()
10 current = current.get_children()[ch]
11 current.set_end_of_word(True)
12 current.increment_frequency()
13
14 def search_prefix(self, prefix: str) -> Optional[TrieNode]:
15 current = self.root
16 for ch in prefix:
17 node = current.get_children().get(ch)
18 if node is None:
19 return None
20 current = node
21 return current
22
23 def collect_suggestions(self, start_node: TrieNode, prefix: str) -> List[Suggestion]:
24 suggestions = []
25 self._collect(start_node, prefix, suggestions)
26 return suggestions
27
28 def _collect(self, node: TrieNode, current_prefix: str, suggestions: List[Suggestion]):
29 if node.is_end_of_word_check():
30 suggestions.append(Suggestion(current_prefix, node.get_frequency()))
31
32 for ch in node.get_children().keys():
33 self._collect(node.get_children()[ch], current_prefix + ch, suggestions)1class RankingStrategy(ABC):
2 @abstractmethod
3 def rank(self, suggestions: List[Suggestion]) -> List[Suggestion]:
4 pass
5
6class AlphabeticalRanking(RankingStrategy):
7 def rank(self, suggestions: List[Suggestion]) -> List[Suggestion]:
8 return sorted(suggestions, key=lambda s: s.get_word())
9
10class FrequencyBasedRanking(RankingStrategy):
11 def rank(self, suggestions: List[Suggestion]) -> List[Suggestion]:
12 return sorted(suggestions, key=lambda s: s.get_weight(), reverse=True)1class AutocompleteSystem:
2 def __init__(self, ranking_strategy: RankingStrategy, max_suggestions: int):
3 self.trie = Trie()
4 self.ranking_strategy = ranking_strategy
5 self.max_suggestions = max_suggestions
6
7 def add_word(self, word: str):
8 self.trie.insert(word.lower())
9
10 def add_words(self, words: List[str]):
11 for word in words:
12 self.add_word(word)
13
14 def get_suggestions(self, prefix: str) -> List[str]:
15 prefix_node = self.trie.search_prefix(prefix.lower())
16 if prefix_node is None:
17 return []
18
19 raw_suggestions = self.trie.collect_suggestions(prefix_node, prefix.lower())
20 ranked_suggestions = self.ranking_strategy.rank(raw_suggestions)
21
22 return [s.get_word() for s in ranked_suggestions[:self.max_suggestions]]1class AutocompleteSystemBuilder:
2 def __init__(self):
3 self.ranking_strategy = FrequencyBasedRanking() # Default strategy
4 self.max_suggestions = 10 # Default limit
5
6 def with_ranking_strategy(self, strategy: RankingStrategy):
7 self.ranking_strategy = strategy
8 return self
9
10 def with_max_suggestions(self, max_suggestions: int):
11 self.max_suggestions = max_suggestions
12 return self
13
14 def build(self) -> AutocompleteSystem:
15 return AutocompleteSystem(self.ranking_strategy, self.max_suggestions)1def main():
2 print("----------- SCENARIO 1: Frequency-based Ranking -----------")
3
4 # 1. Build the system with the default frequency-based strategy
5 system_by_frequency = (AutocompleteSystemBuilder()
6 .with_max_suggestions(5)
7 .with_ranking_strategy(FrequencyBasedRanking())
8 .build())
9
10 # 2. Feed data into the system
11 # 'canada' is added most frequently, followed by 'car'
12 dictionary = [
13 "car", "cat", "cart", "cartoon", "canada", "candy",
14 "car", "canada", "canada", "car", "canada", "canopy", "captain"
15 ]
16 system_by_frequency.add_words(dictionary)
17
18 # 3. Get suggestions for a prefix
19 prefix1 = "ca"
20 suggestions1 = system_by_frequency.get_suggestions(prefix1)
21 print(f"Suggestions for '{prefix1}': {suggestions1}")
22
23 prefix2 = "car"
24 suggestions2 = system_by_frequency.get_suggestions(prefix2)
25 print(f"Suggestions for '{prefix2}': {suggestions2}")
26
27 print("\n----------- SCENARIO 2: Alphabetical Ranking -----------")
28
29 # 1. Build a new system with the alphabetical strategy
30 system_alphabetical = (AutocompleteSystemBuilder()
31 .with_ranking_strategy(AlphabeticalRanking())
32 .build())
33
34 # 2. Feed the same data
35 system_alphabetical.add_words(dictionary)
36
37 # 3. Get suggestions for the same prefix
38 suggestions3 = system_alphabetical.get_suggestions(prefix1)
39 print(f"Suggestions for '{prefix1}' (alphabetical): {suggestions3}")
40
41
42if __name__ == "__main__":
43 main()Which entity is most responsible for supporting efficient prefix-based search in a Search Autocomplete system?
in AutocompleteSystemBuilder there is no need for constructor, also the complete builder implementation does not seem to as per builder pattern, there is no nesting of classes like conventional builder.